Finding structure in text, genome and other symbolic sequences

نویسنده

  • Ted Emerson Dunning
چکیده

Acknowledgments I would like to particularly thank my advisor, Yorick Wilks. Over the years I have known him, Yorick has always been insightful, cheerful and generous. His ability to understand and constructively critique research over an extraordinarily wide range of topics has been extremely helpful, both professionally and academically. Moreover, his willingness to consider and even nurture points of view which run counter to his own intuitions is extraordinary in a world filled with entirely too many ideologues. Even more extraordinary is his willingness to adopt techniques that have proven useful regardless of his initial assessment of them. This flexibility is the mark of a true scientist. The members of my committee, Peter Willett, Robert Gaizauskis and Steve Renals, also provided valuable advice and assistance and oversight. Peter has been especially helpful by sharing his wide knowledge of and historical perspective on the field of information retrieval. In addition, he has been willing to carefully edit this work to graciously help me to completely avoid the use of split infinitives. Robert has spent more time helping me and examining this work than could ever have been expected. Outside of the university, Ellen Friedman also provided enormous support in the form of advice and careful review of this thesis. Her guidance was critical to the quality of the chapters on genomic sequence analysis. Her advice and insight throughout this work have helped me make this thesis far better than it otherwise could have been. Jamie Callan reviewed the chapter on document routing and provided helpful comments. At least as important, however, he supervised the preparation of the InRoute comparison results which provided a connection between my results in in chapter 7 and the IR literature in general. Owen White extracted and provided data for the genomic sequence analysis chapters. Only those who have worked with the sequence databases can know how helpful this was. My current employer, Aptex Software, and the parent company, HNC Software, are to be commended for their encouragement and support during the preparation of this thesis. They have provided financial support, access to computational resources and the Convectis system. This support has been crucial to the completion of this thesis. Finally, my examiners, Keith van Rijsbergen and Steve Renals, are taking a substantial amount of their time to judge this work. I thank them. Summary The statistical methods derived and described in this thesis provide new ways …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Exact and Solo LTR-Retrotransposons in Biological Sequences Using SVM

Finding repetitive subsequences in genome is a challengeable problem in bioinformatics research area. A lot of approaches have been proposed to solve the problem, which could be divided to library base and de novo methods. The library base methods use predetermined repetitive genome’s subsequences, where library-less methods attempt to discover repetitive subsequences by analytical approach...

متن کامل

Profile of Eight Prophage Sequences Present in the Genomes of Different Acinetobacter baumannii Strains

ABSTRACT           Background and Objective: Prophage sequences are major contributors to interstrain variations within the same bacterial species. Acinetobacter baumannii is a gram-negative bacterium that causes a wide range of nosocomial infections, especially in intensive care unit inpatients. Prophage sequences constitute a considerable proporti...

متن کامل

Gene Family: Structure, Organization and Evolution

  Gene families are considered as groups of homologous genes which they share very similar sequences and they may have identical functions. Members of gene families may be found in tandem repeats or interspersed through the genome. These sequences are copies of the ancestral genes which have underwent changes. The multiple copies of each gene in a family were constructed based on gene duplicati...

متن کامل

Bioinformatics Study and Investigation of the Expression Pattern of Several Important Genes Involved in Glycyrrhizin Synthesis of Glycyrrhiza glabra L. in Autumn and Spring Seasons

Glycyrrhiza is one of the important medicinal plants that is in danger of extinction. Search for finding accessions that have a higher glycyrrhizic acid is very important in breeding programs. Functional genomics methods such as EST sequencing prepare the ability to identify consensus gene families among studied species and interpretation of the genome. In this research, 55960 EST sequences of ...

متن کامل

Phylogenetic relationships of Iranian Infectious Pancreatic Necrosis Virus (IPNV) based on deduced amino acid sequences of genome segment A and B cDNA

Infectious Pancreatic Necrosis Virus (IPNV) is the causal agent of a highly contagious disease that affects many species of fish and shellfish. This virus causes economically important diseases of farmed rainbow trout, Oncorhynchus mykiss, in Iran which is often associated with the transmission of pathogens from European resources. In this study, moribund rainbow trout fry were collected during...

متن کامل

Phylogenetic relationships of Iranian Infectious Pancreatic Necrosis Virus (IPNV) based on deduced amino acid sequences of genome segment A and B cDNA

Infectious Pancreatic Necrosis Virus (IPNV) is the causal agent of a highly contagious disease that affects many species of fish and shellfish. This virus causes economically important diseases of farmed rainbow trout, Oncorhynchus mykiss, in Iran which is often associated with the transmission of pathogens from European resources. In this study, moribund rainbow trout fry were collected during...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1207.1847  شماره 

صفحات  -

تاریخ انتشار 1998